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Abstract 

As policy-makers contemplate expanding preschool opportunities for low-income children, one 
possibility is to fund two, rather than one year of Head Start for children at ages 3 and 4. Another 
option is to offer one year of Head Start followed by one year of pre-k. We ask which of these 
options is more effective. We use data from the Oklahoma pre-k study to examine these two 
‘pathways’ into kindergarten using regression discontinuity to estimate the effects of each age-4 
program, and propensity score weighting to address selection. We find that children attending 
Head Start at age 3 develop stronger pre-reading skills in a high quality pre-kindergarten at age 4 
compared with attending Head Start at age 4. Pre-k and Head Start were not differentially li nk ed 
to improvements in children’s pre-writing skills or pre-math skills. This suggests that some 
impacts of early learning programs may be related to the sequencing of learning experiences to 
more academic programming. 
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Introduction 


In light of evidence that high quality early learning experiences can improve children’s 
school readiness and future academic success (Duncan & Magnuson, 2013; Yoshikawa et ah, 
2013), a number of recent proposals at the federal and state levels would expand public early 
childhood education (ECE) programs. These initiatives aim to serve not just more children, but 
to also serve younger children, and to address the detrimental effects of poverty during early 
childhood on children’s wellbeing in the short- and long-tenn (Duncan, Magnuson, Kalil, & 
Ziol-Guest, 2012). This expansion includes the federal Head Start program, a comprehensive 
child development program that provides children with preschool education and other services, 
which children can enter as early as age 3. Indeed, 3-year-olds are also the largest growing 
group of Head Start participants, increasing from 24 percent in 1980 to 40 percent in 2007, and 
comprising 63 percent of first-time Head Start children in 2010 (Aikens, Klein, Tarullo, & West, 
2013; Tarullo, Aikens, Moiduddin, & West, 2010). 

Expanding ECE programs to include younger children would increase the number of 
children participating in programs for multiple years. In fact, over half of all 3-year-old entrants 
now go on to complete two years of Head Start (Aikens et al., 2013). Others transition from 
Head Start at age 3 to state-created and implemented, academically-focused pre-kindergarten 
(pre-k) programs at age 4. In fact, the latter combination of programs is precisely what President 
Obama proposed in his 2013 early learning agenda—expand Head Start to serve 3-year-olds, 
while helping states to increase their educational investments in 4-year-olds. 

Unclear in the Head Start literature is whether the program is designed to provide two 
years’ worth of developmental benefits for children. In K-12 education, cross-grade curricula 
can be designed so that material taught in each grade builds on the skills and knowledge learned 
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previously, and incremental benefits from each year of schooling for learning and labor market 
outcomes are well established (Card, 1999). However, we know little about whether ECE 
programs are designed to do the same. Furthennore, unlike primary education where children are 
separated by grade or state pre-k programs that serve only 4-year-olds, the Head Start model 
combines 3- and 4-year-olds in most classrooms - 75% by one recent estimate (Hulsey et ah, 
2011). If children in their second year of Head Start continue to receive more of the same 
activities rather than increasingly complex, differentiated learning experiences, they may gain 
less from a second year in the program relative to switching to a more academic pre-k program at 
age 4. 

The objective of this study is to answer one key question: If children participate in Head 
Start at age 3, is it more beneficial for them to remain in the program at age 4 or participate in a 
universal pre-k program at age 4? We use data from the study of the Oklahoma Pre-kindergarten 
program (OK pre-k) to compare outcomes for two different preschool ‘pathways’ to kindergarten 
(Gormley et al., 2005, 2008, 2010). One of these involves Head Start at both ages 3 and 4. The 
other involves Head Start at age 3 followed by OK pre-k at age 4. We use a regression 
discontinuity design with a strict age eligibility cutoff for program participation to estimate the 
effect of these pathways on children’s early academic skills at kindergarten. We apply 
propensity score weighting to the analyses to address selection into pathways and compare their 
effects on child outcomes. 

This study extends prior findings from these data in several ways. For academic outcomes, 
Gonnley and colleagues estimated two separate regression discontinuity specifications—one for 
OK pre-k and one for Head Start—calculated treatment effect sizes, and compared effect sizes 
descriptively (Gormley, 2008; Gormley & Gayer, 2005; Gormley, Gayer, Phillips, & Dawson, 
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2005). They compared two separately generated RD effect sizes using only a basic significance 
test (a difference in z-scores)(Gonnley, Phillips, Adelstein, & Shaw, 2010; Paternoster, Brame, 
Mazerolle, & Piquero, 1998). In contrast, this study focuses on comparing the effectiveness of 
attending OK pre-k and Head Start at age 4 amongst age 3 Head Start graduates after pooling 
both pre-k and Head Start children into the same RD model, addressing differential selection into 
the programs. As such, this study is designed to make a rigorous statistical comparison between 
these two programs in a sample of children who attended Head Start at age 3, under key 
assumptions. 

We find that among children attending Head Start at age 3 that one year of Head Start as a 
3 year-old followed by OK pre-k at age 4 have better early reading outcomes at kindergarten 
compared with children who stayed in Head Start at both age 3 and age 4. This suggests that the 
impacts of early learning programs may be related to the sequencing of ECE programs to a more 
academic curriculum at age 4 and the extent to which the Head Start curriculum offers 
differential learning experiences to 4-year-olds who were, and were not, in the program at age 3. 

Background 

The effects of different types of early learning programs 

Head Start. Head Start is a comprehensive child development program that provides 
children with preschool education, health examinations, nutritious meals, and opportunities to 
develop social-emotional skills. This federal program targets very low-income families, and 
children who are at risk of entering school unprepared. Many studies have examined the benefits 
and long-term effects of Head Start, and there are several comprehensive and critical reviews of 
this literature, primarily using data for 4-year-old program participants (see Gibbs, Ludwig & 
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Miller, 2011 and Ludwig and Phillips, 2008 for reviews). 

Because of its use of random assignment, the experimental Head Start Impact Study 
provides the best evidence on the short-term impacts of Head Start on children’s language, 
literacy and early writing skills at ages 3 and 4. The end-of-program-year effect sizes average 
0.2 SD for both the age-3 and age-4 cohorts on early language and literacy skills, and a .15 SD 
effect size on early math skills for age-3 cohort participants (Puma, Bell, Cook, & Heid, 2010). 
Even though short-tenn gains appear to ‘fade-out’, Ludwig and Phillips show that the short-term 
intent-to-treat effects are large enough for Head Start to pass a cost-benefit test (2008). They 
calculate larger treatment-on-the-treated estimates for some key outcomes (e.g., letter-word 
identification effect sizes, where the intent-to-treat impact was 0.24 SD and the corresponding 
treatment on the treated estimate was 0.35 SD). Strong quasi-experimental evidence on the 
effects of Head Start shows long-term benefits on academic outcomes, with effect sizes of 0.2- 
0.3 standard deviations (Currie & Thomas, 1995; Deming, 2009; Garces, Thomas, & Currie, 
2002). These studies looked at single-year impacts of Head Start only, whereas our study 
compares a 2-year Head Start experience to a 1-year Head Start-1-year pre-k experience. 

Pre-kindergarten. Pre-k programs are funded locally (i.e., typically by the state) to 
provide a year or two of education prior to kindergarten for children ages 3 or 4. Nationally, 28 
percent of all 4-year-olds were enrolled in state-funded pre-k across 40 states in 2010 compared 
with 11 percent of 4-year-olds enrolled in Head Start (Barnett, Carolan, Fitzgerald, & Squires, 
2011). However, “pre-k” does not have a standardized meaning with respect to children’s’ ECE 
experience because each state creates their pre-k programs independently, and, thus, the 
characteristics of these program vary widely across states (Gilliam & Ripple, 2004; Jenkins, 
2014; Lombardi, 2003; Pianta & Howes, 2009). Some pre-k programs—such as Oklahoma’s— 
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are recognized as very high quality and offer features such as frequent instructional interactions 
in subject-matter learning, teachers who are emotionally supportive of children and who are 
credentialed, and classroom environments that are well-organized, efficient with time 
management, and include developmental^ appropriate learning materials (Burchinal, 1999; 
Mashbum et ah, 2008; Phillips, Gormley, & Lowenstein, 2009; Pianta et ah, 2005; Wong, Cook, 
Barnett, & Jung, 2008). For these reasons, the effects of any particular pre-k program cannot be 
generalized to state pre-k programs nationwide. 

A randomized study of the state pre-k program serving socioeconomically disadvantaged 
children in Tennessee found short-term gains in language, literacy and math outcomes for pre-k 
participants compared with children who did not participate, which was also confirmed by a 
regression discontinuity analysis (Lipsey, Farran, Bilbrey, Hofer, & Dong, 2011). Oklahoma and 
Boston’s pre-k evaluations also use regression discontinuity designs based on a strict age 
eligibility cutoff and found large short-term improvements in early reading, writing, math skills, 
and executive function (ES range= .99-36) (Gormley, 2008; Gonnley & Gayer, 2005; Gormley 
et ah, 2005; Weiland & Yoshikawa, 2013). Using a similar regression discontinuity design, 
studies of pre-k programs in Arkansas (Hustedt, Barnett, & Jung, 2008) and a five-state pre-k 
comparison found positive effects for early reading, literacy, and math skills (ES range= .23-.96) 
(Wong et ah, 2008). 

Other studies of the effects of pre-k programs have used propensity score (PS) methods, 
finding positive effects for programs in Chicago (Reynolds, Temple, Ou, Arteaga, & White, 
2011; Reynolds, Temple, Robertson, & Mann, 2001), Georgia (Henry, Gordon, & Rickman, 
2006) and in national samples (Magnuson, Ruhm, & Waldfogel, 2007), with lasting cognitive 
gains for the most disadvantaged children. Results from meta-analysis (Camilli, Vargas, Ryan, & 
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Barnett, 2010) and correlational studies (Howes et al., 2008; Huang, Invemizzi, & Drake, 2012) 
also show that children benefit from state pre-k programs. 

Comparing the effects of two types of programs: Head Start and pre-k. An important 
distinction between Head Start and pre-k are the program goals. Head Start mandates a “whole- 
child” approach that aims to comprehensively support children’s development across several 
outcome domains, whereas pre-k programs—particularly Tulsa’s program—often focus on 
children’s early academic skills to prepare children for the academic nature of kindergarten. 
These differences may result in differential program effects across the broad scope of children’s 
outcomes. 

Despite the large body of research on the effectiveness of individual types of ECE 
programs in improving children’s early academic skills, relatively few studies have directly 
compared the effectiveness of Head Start and different state pre-k programs. Henry and 
colleagues (2006) use propensity score matching to address selection and compare Head Start to 
Georgia’s pre-k program, finding that state pre-k participants had statistically significant but only 
modestly higher scores at kindergarten entry relative to similar Head Start participants. Gormley 
and colleagues (2010) calculate separate RD estimates for each age-4 program in Tulsa, OK, and 
find larger effects for OK pre-k participants than for Head Start. The effects of Head Start and 
pre-k vary depending on the comparison treatment condition (Ludwig & Phillips, 2008). Zhai, 
Brooks-Gunn, and Waldfogel (2011) use PS to match Head Start children to children in different 
ECE programs and find that Head Start was associated with improved cognitive and social 
outcomes when compared with children who received parental care or other non-center-based 
care. However, when compared with children who attended pre-k programs (across different 
states) and center-based care, Head Start children had better social but not academic outcomes. 
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In this study, we compare the outcomes of age 4 Head Start and age 4 universal pre-k 
participants at kindergarten entry for a sample of children who attended Head Start at age 3. 

Duration and dosage effects of ECE 

The influence of program duration on children’s outcomes is essential for understanding 
whether two years of Head Start would be more beneficial for children than one year of Head 
Start followed by one year of pre-k. More than half of the children who enter Head Start at age 3 
will stay for an additional year (Tarullo et ah, 2010), yet the research on duration in Head Start, 
and ECE more generally, is limited. The evidence from experimental and non-experimental 
studies suggests that on balance, more participation in center-based ECE is associated with 
stronger cognitive outcomes, especially for low-income children (Behnnan, Cheng, & Todd, 
2004; Campbell, Pungello, Miller-Johnson, Burchinal, & Ramey, 2001; Dearing, McCartney, & 
Taylor, 2009; Hill, Brooks-Gunn, & Waldfogel, 2003; Loeb, Fuller, Kagan, & Carrol, 2004). 
However, the incremental effect of attending a first year of preschool is generally greater in 
magnitude than that of a second year for children’s short and long-tenn outcomes (Arteaga, 
Humpage, Reynolds, & Temple, 2014; Reynolds et al., 2011; Tarullo, Xue, & Burchinal, 2013). 
In addition, some research indicates potentially adverse consequences of long hours of care on 
social and behavioral outcomes in conjunction with positive academic and achievement effects 
(Belsky et al., 2007; Datta Gupta & Simonsen, 2010; Loeb, Bridges, Bassok, Fuller, & 
Rumberger, 2007; Magnuson et al., 2007; Vandell et al., 2010). And while intensive early 
learning interventions such as Abecedarian and Perry Preschool provided 2 to 5 years of program 
services and produced significant effects (Campbell et al., 2001; Schweinhart, 2005), other 
preschool programs produce substantial effects in only 1 year of services (Gormley et al., 2005). 

The Head Start duration research is equivocal, with some indication that two years are 
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more advantageous than one, but not ‘twice’ as advantageous. 1 A number of studies in this area 
use PS methods to address possible bias due to selection into dosage. Burchinal and colleagues 
use the 2006 and 2009 FACES data and find that children who entered Head Start at age 3 and 
also participated at age 4 had modestly higher vocabulary scores relative to children who 
participated in Head Start at age 4 only, with the gains from the second year being much smaller 
than the first (ES of second year=0.10-0.17)(2013). Another PS study uses the 2003 FACES 
data, finding larger effects of 2-year Head Start participation (ES=27-.80)(Wen, Leow, Hahs- 
Vaughn, Korfmacher, & Marcus, 2012). Other PS (Domitrovich et ah, 2013; Skibbe, Connor, 
Morrison, & Jewkes, 2011) and correlational studies of Head Start (Lee, 2011) also find slightly 
larger gains for 2 years over 1 year. 

On the other hand, PS analyses of the Chicago Parent Child ECE program did not show 
significant additional benefits for 2 years of participation versus 1 year (Reynolds, 1995; 
Reynolds et ah, 2011). The authors suggest that the program model may have provided 
redundant instruction for two-year participants. Barnett and Lamy also find no influence of 
duration in a pre-k program on print awareness and math, with some small effects for vocabulary 
(2006). Nores and Barnett conduct a meta-analysis of dosage effects across an international 
sample of ECE programs and find that programs lasting 1 to 3 years had average effect sizes of 
0.3 standard deviations, as compared with 0.2 for programs lasting less than 1 year, with a 
maximum effect size of 0.3 at 3 years or more (2010). 

If longer exposure produces better outcomes, then 2 years of Head Start may be money 
well spent. But the literature does not provide consistent support for the notion that 2 years is 
better than 1, or that individual ECE programs are designed to provide multiple years of unique, 
developmentally appropriate, incremental learning. Thus, it may be that children continue to gain 
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skills in a second year of Head Start, but they could gain even more by switching to a more 
academic age 4 program—state pre-k. Testing this is the goal of our study. 

Possible Curricular and Peer Effects 

Pre-k and Head Start program models differ in several ways. Our study cannot examine 
which of these components may make a difference in children’s outcomes because they are 
confounded with program type. However, two noteworthy differences are curricula and 
classroom peer composition. 

Curricula. As a part of the Tulsa pre-k study, Phillips, Gormley, and Lowenstein (2009) 
examined classroom characteristics in pre-k and Head Start. A key finding from their study was 
that the quality ratings for both programs were in the good-to-high range based on standard 
observational measures; higher than the national averages of both program types (Dotterer, 
Burchinal, Bryant, Early, & Pianta, 2012; Moiduddin, Aikens, Tarullo, West, & Xue, 2012). The 
only differences that emerged between the two programs were the curricula teachers reported 
using. Thus, curricula and related instructional practices may be an important distinction between 
the two programs. 

In addition to differences in curricular approaches, the extent to which the curriculum used 
in Head Start classrooms differentiates children’s age 3 and age 4 learning experiences would 
influence both the Head Start dosage effect and the comparative effect of Head Start to OK pre-k 
(Yoshikawa et ah, 2013). A majority of Head Start classrooms combine 3- and 4-year-olds. 
Consequently, age 3 Head Start graduates are very likely staying in the same classroom, with the 
same teacher, books, and other materials during their second year. If Head Start instruction is 
also the same during children’s second year, Head Start children may not receive increasingly 
complex, differentiated learning experiences on a regular basis, which are critical for intellectual 
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development (Bronfenbrenner, 1989). Indeed, recent work suggests that kindergarten teachers 
spending time on math skills students have already mastered has a negative effect on student’s 
math achievement (Engel, Claessens, & Finch, 2013). 

We know relatively little about whether Head Start curricula are hierarchical in practice 
and evolve as children age, because of the variation in curricula and limited support of their 
efficacy (Clifford & Crawford, 2009). The Head Start program mandates that program curricula 
focus on the “whole child,” where learning occurs through participating in activities. According 
to FACES data from 2000 to 2009, the most common curriculum used in Head Start classrooms 
is the Creative Curriculum (46% of teachers report using), followed by High/Scope (19%), a 
number of other widely available whole-child curricula (e.g., Scholastic, High Reach, 
Montessori)(13%), and other less commonly used curricula (e.g., Galileo, Houghton Mifflin, 
Links to Literacy)(20%). A study of pre-k programs also found that Creative Curriculum and 
High/Scope are the most frequently used curricula in pre-k programs (Clifford et al., 2005); 
Creative Curriculum was also used in the OK pre-k program, though the most common 
curriculum reported by teachers was integrated thematic instruction (Phillips et al., 2009). 

Surprisingly there is little empirical support for High/Scope, none for Creative Curriculum, 
and neither curriculum—as currently used—has demonstrated effectiveness based on rigorous 
statistical standards (U.S. Department of Education, 2013). In addition, most ECE practitioners 
are convinced that whole-child instruction through discovery learning is best for young children 
based on theoretical models such as Piaget, but limited evidence supports this assumption. 
Indeed, recent evidence from the Boston Pre-K evaluation suggests the opposite. Boston’s 
highly effective pre-kindergarten program uses several domain-specific curricula that focus on 
presenting lessons that become increasingly complex and build on the inherent hierarchy of skills 
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within that domain (Klein, Starkey, Clements, Sarama, & Iyer, 2008; Weiland & Yoshikawa, 
2013). Results from recent studies also indicate that children who receive targeted or content- 
specific curricula (e.g., literacy or math) during preschool show moderate to large improvements 
in the targeted content domain (e.g., Clements & Sarama, 2008; Lonigan, Farver, Phillips, & 
Clancy-Menchetti, 2011). Curricula effectiveness also depends on the extent that teachers 
implement them with fidelity. 

This variation in curricula, their limited efficacy, and the unknown degree to which 
learning activities change as children age highlight the ambiguity of the impact of the second- 
year Head Start experience. As explained by Reynolds in his study of dosage in the Chicago 
Parent Child program, “an additional year that simply repeats learning activities of the first year 
would not be expected to make much difference” (1995; p, 23). In contrast, the OK pre-k 
program may be an opportunity for age 3 Head Start participants to receive a novel age 4 -specific 
learning experience and avoid any redundancy in the Head Start whole-child curriculum. 
Curricula packages—including Creative and High/Scope—provide curricular supports to 
individualize instruction for children within a classroom, but it is unclear whether teachers use 
these resources and adjust their instruction accordingly, especially in mixed-age settings. While 
we lack infonnation on the classroom characteristics in our Tulsa Head Start and pre-k data, we 
simply wish to highlight the important role that curricular differences may play in accounting for 
differential effects of the two pathways. 

Peer effects. Classroom composition and peer effects may also play a role in creating 
differential effects of the two pathways. Head Start programs are available to very low-income 
3- and 4-year-old children, whereas the OK pre-k program is universally available to 4-year-old 
children only, but regardless of income. These two program features create differences in both 
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the distribution of children’s ages and the distribution of family income in the classroom, either 
of which can influence children’s outcomes through peer effects. 

For practical reasons, Head Start classrooms often combine 3- and 4-year-olds. While child 
development and educational theorists have supported the use of mixed-age classrooms 
(Bandura, 1986; Katz, 1990; Montessori, 1917; Vygotsky, 1978), the empirical research in this 
area is equivocal; some studies show limited positive effects (Blasco, Bailey, & Burchinal, 1993; 
Urberg & Kaplan, 1986), but several studies find null or negative effects of mixed-age settings 
(Bailey, McWilliam, Ware, & Burchinal, 1993; Bell, Greenfield, & Bulotsky-Shearer, 2013; 
Hattie, 2002; Moller, Forbes-Jones, & Hightower, 2008; Winsler et ah, 2002). 

The more important feature of mixed-age classrooms may be that a one-year age difference 
during early childhood can create substantial variation in the classroom’s distribution of 
children’s skills. In turn, the skill level of classroom peers can substantially affect children’s skill 
development because teacher-directed activities are often kept to a minimum in ECE. Henry and 
Rickman (2007) study peer effects in preschool children and find that having peers with higher 
cognitive skills produced positive effects on children’s early math, literacy and language skills.. 
Others find beneficial peer effects for preschool children with low baseline skills (Justice, 
Petscher, Schatschneider, & Mashburn, 2011), but also for preschool children with high baseline 
skills (Mashburn, Justice, Downer, & Pianta, 2009). Studies also suggest positive peer effects on 
math and reading achievement for school-aged children (Cascio & Schanzenbach, 2012; Chetty 
et ah, 2011; Elder & Lubotsky, 2009; Hanushek, Kain, Markman, & Rivkin, 2003; Hoxby & 
Weingarth, 2005; Zimmer & Toma, 2000). 

In our study, it is possible that the classroom compositions in both age-4 preschool 
environments could have different and opposing peer effects on the age-4 learning experiences 
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of Head Start graduates. If second-year Head Start children have more advanced skills than their 
new classmates that they acquired during the first year of Head Start, this could benefit the first¬ 
time Head Start age 4 children through peer learning, increasing the rate at which age 4-only 
children can catch-up to their second-year peers (Winsler et ah, 2002). Simultaneously, younger 
age 3 peers in mixed-age Head Start classrooms could slow additional progress for second-year 
students either from behavioral disruption, from an absence of positive academic peer effects, or 
related to the curriculum issue, the level of content teachers present based on the group’s overall 
ability (Betts & Shkolnik, 2000; Hoxby & Weingarth, 2005; Lavy, Paserman, & Schlosser, 2012; 
Moller et al., 2008). In this situation, Head Start students from the age 3 cohort provide positive 
peer effects for children from the age 4 cohort, but derive no personal benefit from peer effects. 
Both mechanisms would reduce the added benefits of children’s second year in Head Start. 

On the other hand, the age 3 Head Start graduates attending OK pre-k at age 4 may be the 
beneficiaries of positive peer effects because the OK pre-k program is universal, and classroom 
compositions may be more mixed in terms of children’s socioeconomic backgrounds (Reid & 
Ready, 2013). Because poor and low-income children have substantially lower school-readiness 
skills than their higher income peers, peer effects in mixed socioeconomic classrooms are 
particularly valuable for the most disadvantaged children (Barnett & Belfield, 2006; Hart & 
Risley, 1995; Henry et al., 2006; Rouse, Brooks-Gunn, & McLanahan, 2005; Schechter & Bye, 
2007; Zimmer & Toma, 2000). Still, it is possible that universal pre-k classrooms in 
economically segregated neighborhoods are not actually socioeconomically diverse (Dotterer et 
al., 2012). 

These two opposing peer effects—second-year Head Start children as benefactors and OK 
pre-k-Head Start graduates as beneficiaries—would attenuate the overall effect of Head Start. 
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With our dataset, we are not able to estimate the effects of peers in an empirical model, and 
Phillips et al. did not explore classroom peer composition in their study of OK pre-k classroom 
characteristics (2009). However, we do describe some of the conditions likely determining peer 
effects. 

On balance, we judge that prior findings and the likely direction of curricular and peer 
effects argue that age-3 Head Start graduates will have stronger early academic skills if they 
participate in the OK pre-k program at age 4 relative to children who stay in Head Start for a 
second year at age 4. It is important to know whether children would be better off in one age 4 
preschool experience over another especially since this particular pathway - Head Start at age 3 
followed by State Pre-K at age 4 - is the plan promoted by the Obama administration, and 
appears to be the direction in which national policy is evolving. 

Methods 

Research design and analysis 

Our research question is as follows: If children participate in Head Start at age 3, do they 
have better early academic skills at kindergarten entry if they stay in Head Start for an additional 
year at age 4 or if they participate in a high-quality state pre-k program at age 4? Answering this 
question involved two analytic processes: estimating treatment effects for each pathway and 
addressing selection into age 4 treatments. We estimated treatment effects using a regression 
discontinuity model. We applied propensity score weighting to the regression discontinuity 
model to make the groups as comparable as possible. 

We used a dummy variable approach to deal with missing data. 2 All analyses were 
conducted using Stata 12 (StataCorp., 2011). We briefly describe the intuition of these 


15 



procedures here and present the methodological details in Appendix 1, and supplemental figures 
and calculations in Appendix 2. 

Data 

Participants. The evaluation focused on the children enrolled in the Tulsa pre-K programs 
in 2006-7, using the data from the Tulsa Preschool Study 2006-07 Public Use Data File. This 
evaluation of the Oklahoma’s state-funded universal pre-k program administered in Tulsa Public 
Schools, and the Tulsa County Head Start program administered by local Community Action 
Project sites was conducted by a team from Georgetown University who made the data public 
(Gormley, 2011). The data come from four sources: direct cognitive assessments of children at 
the beginning of the school year; parent surveys collected at their child’s cognitive assessment; 
social-emotional assessments conducted by each child’s teacher; and administrative data from 
Tulsa Public Schools and Head Start. 

Our research questions focused on the children eligible for free or reduced-price lunch that 
attended Head Start at age 3 (n=540). Among these children, the analysis data set includes 
students who were entering the OK pre-k, age-4 Head Start, or OK public school kindergarten in 
the 2006-07 school year. The two preschool pathways we created and their sample sizes are: 1) 
participants in OK pre-k at age 4 who participated in Head Start at age 3 (211 total; 88 
kindergarten entrants and 123 pre-k entrants), and 2) participants in Head Start at age 4 and age 3 
(329 total; 119 kindergarten entrants, 210 HS entrants). Ninety-two percent of the OK pre-k 
children in our sample attended full-day pre-k (6.5 hours) making these participants as similar as 
possible to Head Start participants, which was a full-day program in Tulsa. Child and family 
characteristics for both groups are presented in columns 1 and 2 of Table 1. 

We also examined whether our analytic sample was representative of the Tulsa 
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kindergarten population. In Appendix 2.1, we present descriptive statistics for Kindergarten 
children who attended OK pre-k or Head Start and other Tulsa Kindergarten children in the 
Tulsa pre-k study file. This table reveals that in general, children attending one of the public 
preschool programs are more disadvantaged than their non-participating peers. They are more 
likely to be low-income, Black, to speak a language other than English in the home, are less 
likely to have internet access at home and to have parents who are married. 

Measures. Child academic assessments occurred in August 2006 and included three 
academic subtests from the Woodcock-Johnson Achievement Tests-III (Woodcock, McGrew, & 
Mather, 2001). The Letter-Word Identification subtest measures early reading skills, whereby 
children are asked to identify letters and pronounce words. The Spelling subtest requires 
children to trace letters, write letters in upper and lowercase, and to spell words, measuring early 
writing and spelling skills. The Applied Problems test has children perform simple calculations 
to solve math problems, which assesses children’s early mathematical thinking with respect to 
counting, cardinality, and early operational skills. The reliability coefficient for the 3- to 5-year- 
old age group ranges from .97 to .99 (Woodcock et al., 2001). The same subtests of a comparable 
Spanish test, the Woodcock-Munoz Bateria, were given to Hispanic students capable of being 
tested in Spanish. The assessment values are in raw scores and are not nationally normed. 

Further detail regarding the sample, procedures, measurement, and assessments are available in 
Gonnley et al. (2005). 

[Insert Table 1] 

1. Estimating treatment effects: Regression discontinuity design 

Our study implements a regression discontinuity (RD) design, a method designed to 
provide unbiased estimates of treatment effects under certain conditions. The RD technique 
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exploits the fact that the OK preschool programs enforced a strict age cutoff for participation 
based on child’s birth date, so that children who turned 4 before the cutoff (September 1 of 2005- 
06 school year) were eligible to participate in the OK pre-k and age-4 Head Start programs, and 
children who turned 4 after the cutoff were not. The primary condition for conducting an RD 
analysis is the use of a quantitative assignment variable with a designated cutoff score that 
determines exposure to treatment (Imbens & Lemieux, 2008; Shadish, Cook, & Campbell, 

2002). Therefore in our analysis, child age—measured as distance between their birthdate and 
the cutoff birthdate in days—is the assignment variable for the RD specification. This particular 
RD design is referred to as an “age-cutoff’ RD, and has been widely adopted for studying the 
effects of public prekindergarten programs (Lipsey, Weiland, Yoshikawa, Wilson, & Hofer, 

2014; Wong et al., 2008). Figure 1 shows the discontinuity in treatment status by age for the age 
4 OK pre-k and age 4 Head Start groups. 

Using RD to compare the mean outcomes of children who made the cutoff to those who did 
not provides ‘pseudo’ pre- and post-test measures for OK pre-k and Head Start because all 
children in the study—those who made the cutoff and those who missed the cutoff—were 
assessed at the same time (August 2006). The RD sample includes two cohorts of children; 
cohort 1 children are 5-6 years old and are entering kindergarten at the outcome assessment date, 
and cohort 2 children are 4-5 years old and are entering a preschool program at the outcome 
assessment date. Therefore at the time of testing, cohort 1 was treated by Head Start or OK pre-k 
during the 2005-06 school year (i.e., bom before the cutoff), and cohort 2 had not yet 
participated in either age-4 program (i.e., born after the cutoff). Because the children in cohort 2 
had selected into either age 4 Head Start or OK pre-k at the testing date, the members of cohort 2 
entering pre-k or Head Start in 2006-07 can serve as the pre-test comparison group for cohort 1 
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children who completed the same program. The intuition here is that our RD estimates within- 
pathway changes in children’s outcomes by comparing the mean outcomes of the two cohorts. 

The important feature of this between-cohort, within-pathway comparison using RD is 
that the pathway treatment effects are identified by comparing the average outcomes for children 
with birthdays just above and below the cutoff date. This difference in mean outcomes at the 
cutoff point is captured by a dichotomous indicator variable (i.e., making the treatment cutoff=l) 
shown in the model below. Therefore, a key assumption of this RD model is that the children on 
either side of the cutoff differ only in age, and are otherwise comparable (with respect to 
potential outcomes), known as the local conditional independence assumption (Van Der Klaauw, 
2008). All other characteristics of these individuals can be considered independent of treatment 
status, and therefore should be ‘smooth’—not discontinuous—around the cutoff. One can test 
this assumption by comparing the means of observed characteristics within a bandwidth around 
the treatment cutoff. We did this for observations very close to the cutoff (90-day bandwidth) 
and for the full analysis sample (270-day bandwidth) for each pathway (shown in Appendix 
2.11). We find that across all variables included in the models there were very few significant 
relationships between child cohort and child and family covariates within each pathway when the 
IPT weights are applied. 

We also tested for the smoothness of covariate means around the cutoff graphically. In 
Figure 2b, 2c, and 2d, we show histograms of covariate proportions for Hispanic, reduced-price 
lunch, and parents with a High School degree or higher, near the cutoff. These figures illustrate 
that the distributions of children’s observable characteristics are similar on both sides of the 
cutoff. Because the composition of covariates is similar across the cutoff (i.e., cohorts) within 
each pathway, these two diagnostics also indicate that our sample is not biased by differential 
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attrition between the preschool and kindergarten years, which is central to the smoothness 
assumption in age-cutoff RDs (Lipsey et ah, 2014). We also used the histograms in Figure 2 to 
ensure that observations were not disproportionately clustered near the cutoff. 

Because age—measured as distance from the birthdate cutoff—is included in the analysis 
model, this removes any age-related contributions to differences in outcomes so that, conditional 
on other covariates, all that remains is the effect of the age-4 program. That is, regression 
adjustment removes the effects of age for those in each cohort, so their outcome is adjusted to 
what it would have been as follows: The older students within cohort 1 (who have completed the 
preschool program) have their scores adjusted back to what they would have been at their 5 th 
birthday, and since these adjusted scores include the effect of the preschool program, they can be 
used as post-test measures. The younger students within cohort 2 have their scores adjusted 
forward to what they are expected to be at their 5 th birthday, and since these adjusted scores do 
not include the effect of the preschool program they are just entering, they can be used as pretest 
measures. 3 The effect identified in the RD model is an average treatment effect that generalizes 
to cases closest to the cutoff and are therefore most similar in potential outcomes, also known as 
a local average treatment effect (Angrist & Pischke, 2008). 

Model specification. We estimated the RD models using Ordinary Least Squares 
regression with PS weights (described below) to generate local average treatment effects of each 
pathway and to test for pathway differential effects on outcomes at kindergarten entry. In 
combining this estimand with that of propensity score methods, which estimate the average 
treatment effect for treated cases, we refer to our estimand as a local average treatment effect on 
the treated. Comparing two different exposures with RD involved a nuanced RD specification. 
We include an interaction term between the treatment indicator (birthdate occurs before the 
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cutoff=l) and an indicator for one of the two pathways (cutoff* age 3 and age 4 Head Start) to 
test for differential effects between the two exposures. The model also controls for parent’s 
education, child race, sex, reduced-price lunch status, exposure to other non-parental care 
(yes=l), and missing data indicators, presented below: 

Y iJC = a + & Cutoff ic + p 2 (Cutoff ic * HS ic ) + p 3 HS ic + (3 4 (Age ic - Q)+ p s (Age ic - 

QY + Zic + e ic 

Where Y is one of three early academic skill outcome measures (/'), indexed by child (i) and 
classroom (c). Cutoff is a dichotomous indicator of whether the child’s birthdate occurs before 
the eligibility cutoff for OK pre-k or Head Start and equals 1 if the child was treated. OK pre-k is 
the reference group and only the indicator for Head Start (at age 4) is included (/? 3 ). Therefore, 
the differential treatment effect for age 4 Head Start—our coefficient of interest—is indicated 
by /? 2 , which is an interaction between the cutoff indicator (treated) and the Head Start indicator. 
A linear combination of + /? 2 represents the (local) average treatment effect for Head Start, 
whereas /? x represents the (local) average treatment effect for OK pre-k, the reference group. 

/? 4 is the effect of the quantitative assignment variable, age, which is measured in days and is 
centered at the birthdate cutoff Q (September 1). /? 5 is a quadratic version of age and Z is a 
vector of control variables. The error term is indexed by child and classroom to reflect our 
classroom clustered standard errors. An RD specification comparing two separate discontinuities 
as we do here (/? 2 ) is also referred as a “difference-in-discontinuities” design (Grembi, 

Nannicini, & Troiano, 2012). 

Because the treatment effect comes from this discontinuity in outcomes at the birthdate 
cutoff for treatment, it is critical to check for an appropriate ‘bandwidth’, which involves an 
analysis of restricted samples of observations clustered around the cutoff within a range of the 


21 



assignment variable (e.g., +/- 90 days, 180 days) (Schochet et al, 2010; Van Der Klaauw, 2008). 
The intuition behind this procedure is that the units close to the cutoff are likely to differ only in 
their exposure to the treatment, but those further from the cutoff might differ in additional ways. 
In our RD models we used a modest bandwidth restriction of 270 days (3/4 year) to ensure 
exchangeability in observations on either side of the treatment cutoff while also preserving 
power and precision in our relatively small treatment groups (Schochet et al., 2010). See 
Appendix 1.2 for further detail on our RD methodology and robustness tests. 

2. Addressing selection: Propensity score methodology 

The information in Table 1 shows that children’s characteristics differ between pathways. 
We use PS weighting methods to adjust for these observable differences. Propensity score 
weights induce comparability between Head Start and OK pre-k children, allowing us to make a 
statistical comparison of the two treatment effects in the same RD model. 

The PS is the predicted probability of a given exposure conditioned on a rich set of 
covariates. This score is then applied in analyses to reduce confounding between the exposure of 
interest and outcomes from observable factors (Heckman, Ichimura, & Todd, 1998; Rosenbaum 
& Rubin, 1983). A critical feature of PS methods is the assumption that there is no confounding 
due to unobserved variables. Because this assumption is untestable, we cannot be confident that 
our results represent causal estimates of the impact and differential effects of the preschool 
pathways. They are merely the best possible correlational estimates of our effects of interest. 

This is especially true in our study since we do not know why age 3 Head Start participants 
would choose pre-k over Head Start at age 4. Another assumption of PS methods in our 
application is that the relationship between individual characteristics and treatment for both Head 
Start and OK pre-k children follow the same functional form (i.e., a logistic response function). 
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One can implement PS methods in a number of ways, with matching methods being most 
common (Caliendo & Kopeinig, 2008). In this study, we use a method based on Inverse 
Probability of Treatment Weights (IPTW) a form of the Thompson-Horvitz survey sampling 
weight (Foster, 2011). Weights are calculated as the inverse of the predicted probability of 
receiving the exposure a person actually received (i.e., Treated group weights = 1/PS; 
Comparison group weights= 1/1-PS). Because the PS is a summary of the observed covariates 
used in the specification to predict an individual’s treatment status, this technique then inflates 
the importance of cases that are underrepresented in a given exposure to create comparable 
groups (i.e., by having a smaller value in the denominator of their IPTW). In this way, IPTWs 
create a pseudo-population in which selection bias from observed factors is removed and 
observations (children) are exchangeable between exposures (pathways). Our analyses use these 
IPTWs in the RD models described above. 

After calculating the propensity scores for each age 3 Head Start graduate, we assessed 
whether there was common support across the age 4 OK pre-k and Head Start groups using the 
histograms shown in Figure 3. This indicated that there was adequate overlap in propensity 
scores, meaning that individuals in both treatment states were comparable with respect to their 
propensity for treatment (i.e., were exchangeable), allowing us to use PS methods. 

After implementing PS methods, it is critical to assess comparability in covariate means 
across exposure groups, referred to as balance checking. Our balance checking involved 
regressing each covariate on the exposure using the propensity score weights. The results are 
reported in columns 3 and 4 of Table 1, which shows the IPT-weighted group means for both 
pathways compared with the unweighted group means. An asterisk in the left column indicates a 
significant difference in proportions. The two groups become very similar with respect to 
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observed covariates after weighting, and there are no remaining significant relationships between 
Head Start or pre-k and the covariates. 

Ideally, we would have additional variables in our propensity score equation to help us 
further capture a family’s preference for pre-k and Head Start (e.g., distance between children’s 
homes and OK pre-k and Head Start program sites). However, we use the same set of covariates 
that Gormley and colleagues (2011) use in their propensity score analysis study, matching 
children who attended OK pre-k to kindergarten children who did not attend either Head Start or 
OK pre-k (and analogously matching Head Start participants). These variables provide more 
detailed information on children and their families than ‘convenience’ variables alone (i.e., age, 
gender, race, marital status)(Shadish, Clark, Steiner, & Hill, 2008). In addition, propensity score 
methods are better able to remove bias when comparing cases within the same locality and when 
study outcomes are short-term (proximate to selection), as is the case in our Tulsa sample 
(Bloom, Michalopoulos, Hill, & Lei, 2002). See Appendix 1 for further detail. 

Results 

Pathway effects 

Full model results are presented in Table 2, and the main findings are illustrated in Figure 
4. The coefficients in Table 2 represent changes in raw scores after participation in an age 4 
preschool program, estimated from PS-weighted RD models. Our key coefficients of interest are 
in the grey box at the top of the table that includes the calculated effect sizes shown below the 
standard error of the estimate. 

[Insert Table 2] 

We find that both age 4 programs improved children’s early reading and writing skills and 
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neither program significantly improved children’s early math scores. The primary difference in 
effects between the two preschool pathways was in children’s letter-word recognition, with a 
significant difference in effects size of .46 indicating that the OK pre-k group shows treatment 
effects twice as large as the age 4 Head Start group. Both preschool pathways improved 
children’s early spelling scores equally well. 

The effect sizes for the WJ-Letter-word subtest at kindergarten entry are 0.92 for age 3 
Head Start graduates who attended OK pre-k at age 4, and 0.46 for children who stayed in Head 
Start at age 4. The effect sizes for the WJ-Spelling subtest are 0.68 for children who attended OK 
pre-k at age 4, and 0.53 for those who attended Head Start at age 4. The difference in effect 
sizes for spelling is not significant. 

[Figure 1 about here] 

Another way to test for dosage effects of a second year in Head Start would be to compare 
the outcomes of children who attended two years of Head Start to those that only attended one 
year. We tested this using the OK study data, comparing children who attended Head Start at 
age 4 to those who attended at both ages 3 and 4. We employed the same methodology as above, 
combining regression discontinuity and propensity score weighting. The results are shown in 
Appendix 2.3. Both the 1 and 2-year participants showed significant improvements in applied 
problems (ES= .39, .46, respectively), but the improvements made by second-year Head Start 
children were not significantly larger than those of first-year children. There were no other 
significant effects of either pathway. 5 
Descriptive comparison of classroom peers 

In Appendix 2.2 we present the average assessment scores for the age 3 Head Start 
graduates measured at the beginning of their age 4 programs in 2006-07 (using the younger 
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cohort) as a proxy for a post-age 3 Head Start assessment. 4 We compare the age 3 Head Start 
graduates attending OK pre-k to those attending a second year of Head Start and find that the 
two groups do not have significantly different letter-word and applied problems scores (p= 0.45, 
0.50), but that second-year Head Start entrants have higher spelling scores (Standardized mean 
difference (SMD)=0.27, p=0.00). This indicates that the two groups of children were 
comparable in tenns of most academic skills at the start of their age-4 program. However, 
comparing the ability and characteristics of age 3 Head Start graduates’ with their classroom 
peers in their age-4 programs who did not attend age 3 Head Start reveals more consistent 
differences. Age 3 Head Start graduates appear to have stronger early academic skills relative to 
their peers in age-4 Head Start, while the skills of those graduates attending OK pre-k are fairly 
similar to their peers. In addition, the peers of children in the OK pre-k are from higher income 
families. These comparisons indicate—at least descriptively—the potential for different peer 
effects for both the OK pre-k entrants and age-4 Head Start entrants (further detail in Appendix). 

Discussion 

Motivated by the increasing number of children entering Head Start at age 3 and the 
expansion of public preschool programs for children at age 4, the objective of this study was to 
answer the question: If children participate in Head Start at age 3, is it more beneficial for them 
to stay in the Head Start program at age 4 or to participate in a high quality, universal state pre¬ 
kindergarten program at age 4? There was limited prior research on whether the Head Start 
program is effective as a two-year program that builds upon what children learned at age 3, or 
whether Head Start is best thought of as a 1-year program that children can enter at age 3 or age 
4, with minimal incremental benefits from the second year of the program. To examine this 
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issue, we compared two sets of age 3 and age 4 preschool exposure sequences that we called 
pathways into kindergarten: 1) age 3 Head Start and age 4 OK pre-k, and 2) age 3 Head Start and 
age 4 Head Start. We employed a combination of strong quasi-experimental methods, using 
regression discontinuity to estimate the effects of both age-4 programs, and propensity score 
weighting to address selection into these two ‘pathways’ into kindergarten. 

Our findings suggest that children attending Head Start at age 3 will have stronger early 
reading skills if they attend a high quality universal pre-k program at age 4 rather than a second 
year of Head Start. We find that among Tulsa children attending Head Start at age 3, those 
attending the OK pre-k program at age 4 have stronger letter-word recognition at kindergarten 
entry when compared with attending Head Start again at age 4. The comparative effect of the 
two age 4 programs was striking, with a differential that was two times the effect size of the 
Head Start program itself on letter and word identification skills (ES=0.98, 0.46, OK pre-k and 
Head Start, respectfully). OK pre-k and Head Start were both equally as effective at improving 
children’s early writing and spelling skills (ES= 0.68, 0.53; no significant difference) and neither 
program significantly improved children’s math skills. 

Though the only significant differential effect we found in our study was on the LW score, 
the effect size for the difference was substantial—.46—where children who switched to OK pre- 
k had twice the estimated effect size of their Head Start peers. Recent estimates of the disparity 
in reading scores between kindergarten children in the top and bottom deciles of income is 1.25 
standard deviations (Reardon, 2011). In terms of the achievement gap, then, the .46 effect we 
find in our study would represent more than one-third of this disparity in early reading skills. 
Note that the effect sizes for pre-k are similar to those found in other studies, particularly those 
of Gormley and colleagues on the OK pre-k program (0.2-0.9), and that the effect sizes for Head 
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Start are larger than those found in the Head Start Impact Study experiments (0.2-0.3). 

These findings are consistent with other studies of dosage in early education that show 
little to no marginal effect of a second year of an ECE program on child outcomes in the short 
and long tenn (Arteaga et al., 2014; Reynolds, 1995; Reynolds et ah, 2011; Schweinhart & 
Weikart, 1981; Tarullo et ah, 2013). There are several possible explanations for why age 3 Head 
Start graduates in OK pre-k at age 4 outperfonn children who remain in Head Start at age 4. It 
may be that the curricula used in Head Start classrooms do not adequately differentiate children’s 
age 3 and age 4 learning experiences. Because a majority of Head Start classrooms combine 3- 
and 4-year-olds, it is likely that age 3 Head Start graduates remain in the same classroom, with 
the same teacher and other materials during their second year. This may not provide Head Start 
children with the differentiated learning experiences that are essential to children’s intellectual 
development (Bronfenbrenner, 1989). Because the OK pre-k advantage was concentrated to 
early reading outcomes, the instructional repetition may be specifically related to Head Start 
children’s exposure to new books or literacy activities in their second year. In contrast, the OK 
pre-k program may have provided novel age 4-specific learning experiences and materials, 
avoiding curriculum redundancy in a more academically focused environment. While there are 
numerous ways in which these program models differed, our study was not able to assess which 
of these program characteristics caused the observed difference because they are confounded 
with program type. However, this is an important avenue for future research. 

Furthermore, if programs are not designed to build on gains, they may show lower 
incremental impacts when measured towards the end of the program relative to children’s 
outcomes measured mid-program. Some ECE programs appear to have larger effects when 
assessments occur during implementation with effect sizes decreasing at the end of treatment, 
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which occurred in the Abecedarian Project and Project CARE (Ramey, Bryant, Sparling, & 
Wasik, 1985; Ramey et ah, 2000). Children were assessed at the end of their age 4 program in 
the OK preschool study, but for our research question, we ideally would have measured 
outcomes at the end of the age 3 program year. In this vein, the outcome measurement for the 1- 
year OK pre-k exposure would be timed to catch the maximal benefit of pre-k, but we would not 
know the contribution of age 3 Head Start without a post-age 3 Head Start measure. Measuring 
this ‘value-added’ from age 3 Head Start in both pathways could be particularly important if 
Head Start is not actually designed to be a 2-year program, and we may have underestimated the 
effects of Head Start for second-year students. 

It is also possible that peer effects in each of the age 4 preschool environments could have 
different and opposing effects on the age 4 learning experiences of age 3 Head Start graduates. 

If second-year Head Start children have more advanced skills than their new classmates that they 
acquired during the first year of HS, this could benefit the other first-time age 4 Head Start 
children through peer learning. In this situation, age 3 Head Start graduates are benefactors of 
peer effects, while the age 3 Head Start graduates who attend OK pre-k at age 4 may become 
beneficiaries of positive peer effects because the OK pre-k program brings in children from 
higher income families with stronger school readiness skills. These two opposing effects could 
have reduced the identified impact of Head Start. While we could not empirically estimate the 
effects of peers, we conducted some descriptive analyses of the ability and characteristics of the 
peers of age 3 Head Start graduates. This suggested that the opposing peer effects hypotheses are 
plausible for both age-4 programs. 

Overall, our study suggests that these two preschool pathways may matter. However the 
specific reasons for why they may matter, and the extent to which they matter in different states 
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with different programs must be studied in future research. Indeed, we did not find any 
differences between the two programs in improving children’s early writing skills, and neither 
improved their early math skills. In other contexts, it is possible that there may be no differences 
in the sequencing of programs on children’s school readiness. Understanding the differences 
between these two pathways is important for policy, but we could not know the causal effects 
based on our study alone. 

The most substantial limitation of our study is that propensity score methods assume there 
is no unobserved confounding, which is not testable, and therefore our estimates do not represent 
causal effects. We also were not able to assess the specific mechanisms or program features 
through which OK pre-k produced better reading skills, and this must be addressed by future 
research. The other study limitations are as follows: 1) the Tulsa programs may not be 
representative of most state pre-k and Head Start programs because of Tulsa’s stringent quality 
standards and classroom quality ratings that are higher than national averages; 2) children living 
in Tulsa, OK are not representative of the broader population of children in the U.S.; 3) we 
cannot identify benefits from age 3 treatments beyond what is summarized into the scores of the 
age 4 assessment of the younger cohort in our sample; 4) our sample sizes may not provide 
sufficient power to detect effects, 5) we cannot know why some parents took their children out of 
Head Start in the second year; 6) we do not have other neighborhood or school-level information 
about the representativeness of Head Start and OK pre-k program sites, nor do we have or 
classroom-level information about the teachers curricular choices and instructional practices to 
explore our hypotheses about differential instruction; 7) we do not have infonnation about 
children’s summer learning opportunities between their age 3 and age 4 programs; 8) we cannot 
assess whether our RD estimates would be biased from sample attrition into kindergarten in the 
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younger cohort, and; 9) Head Start and pre-k have different goals and may often serve different 
populations. While Head Start supports child cognitive, emotional, and physical development for 
very low income children, pre-k programs often focus solely on academic activities to prepare 
children for school entry, and also may be offered to any child who is age-eligible regardless of 
income or need. 
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Notes 

1 The Head Start Impact Study did not include an experimental analysis of participating in one 
year versus two because children were able to select into receiving Head Start at age 4 after 
being randomly assigned to treatment at age 3. 

2 To our knowledge, the literature is unclear as to how one should handle missing data in a 
propensity score analysis. Because multiple imputation models the relationship between the 
outcomes, exposure and covariates simultaneously, this violates the analytic feature of PS 
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whereby the relationship between the covariates and exposure and covariates and outcome are 
separated. We attempted to implement Full Infonnation Maximum Likelihood methods, but our 
pathway sample sizes were not adequate to achieve convergence in these models. The Dummy 
Variable Adjustment approach (DVA) is biased if covariates with missing data and without 
missing data are correlated, but unbiased if uncorrelated with one another (Puma, Olsen, Bell, & 
Price, 2009). In our sample, these correlations were all below 0.1. We also tested the robustness 
of our DVA approach relative to multiple imputation (fully conditional specification; 50 imputed 
datasets) by estimating our RD models using both methods without weighting by the propensity 
scores. Both missing data strategies yield very similar coefficients and standard errors, with no 
major differences in significance on our focal treatment variables (shown in Appendix 2.12) 

3 We checked for noncompliance with the age cutoff in the data and found very few children 
who did not comply with the treatment assignment rule (7 total). These children are omitted 
from the analysis. 

4 We assume that the selection mechanisms into OK pre-k or Head Start at age four do not vary 
between cohorts. 

5 The differences in propensity score weights constructed for the 1 vs. 2 years of Head Start 
analyses and the age 4 Head Start vs. OK pre-k analyses (for age 3 Head Start graduates) account 
for the differences in pathway effect sizes and significance across comparisons. 
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Table 1 . Covariate balance between children who attended Age 3 Head Start + Age 4 OK 
Pre-k and Age 3 Head Start + Age 4 Head Start in observed data and in Propensity Score 
weighted data _ 



(1) 

(2) 

(3) 

(4) 



Observed group 

PS weighted group 



means 

means 



HS Age 3; 

HS Age 3; 

HS Age 3; 

HS Age 3; 

Diff 


OK Pre-k 

HS Age 4 

OK Pre-k 

HS Age 4 



Age 4 


Age 4 



Covariates 






Reduced-price lunch 

0.10 

0.03 

0.06 

0.06 

0.00 

White 

0.10 

0.09 

0.10 

0.10 

0.00 

Black 

0.64 

0.44 

0.54 

0.52 

0.02 

Hispanic 

0.17 

0.39 

0.27 

0.30 

0.03 

Asian/Native/Other 

0.08 

0.07 

0.08 

0.07 

0.01 

Female 

0.50 

0.55 

0.52 

0.52 

0.00 

Below High School 

0.08 

0.19 

0.12 

0.15 

0.03 

High School 

0.30 

0.28 

0.31 

0.29 

0.02 

Some college 

0.32 

0.32 

0.31 

0.32 

0.10 

College + 

0.09 

0.05 

0.07 

0.07 

0.00 

Child had some non-parental care at age 3 

0.55 

0.46 

0.52 

0.50 

0.02 

Internet access in home 

0.33 

0.29 

0.30 

0.32 

0.02 

Number of books in home (1-5 scale) 

1.86 

1.93 

1.93 

1.94 

0.00 

Parent is foreign-born 

0.28 

0.43 

0.36 

0.36 

0.00 

English is home language 

0.71 

0.59 

0.65 

0.65 

0.00 

Child has health insurance 

0.79 

0.78 

0.78 

0.80 

0.02 

Married 

0.26 

0.36 

0.31 

0.33 

0.02 

Child tested in both English and Spanish 

0.11 

0.31 

0.20 

0.24 

0.04 

Father lives in home 

0.35 

0.44 

0.40 

0.41 

0.01 

Full day OK pre-k 

0.92 

- 

0.92 


- 

Parent education missing 

0.21 

0.16 

0.19 

0.17 

0.02 

Non-parental care missing 

0.26 

0.29 

0.27 

0.27 

0.00 

Internet missing 

0.22 

0.16 

0.20 

0.17 

0.03 

Books in home missing 

0.21 

0.17 

0.19 

0.17 

0.02 

Foreign-born parent missing 

0.10 

0.06 

0.08 

0.08 

0.00 

Home language missing 

0.16 

0.13 

0.15 

0.13 

0.02 

Health insurance missing 

0.17 

0.13 

0.16 

0.13 

0.03 

Marital status missing 

0.21 

0.18 

0.19 

0.18 

0.01 

Father status missing 

0.22 

0.17 

0.20 

0.18 

0.02 

Health status missing 

0.16 

0.13 

0.15 

0.13 

0.02 

Medical visit missing 

0.16 

0.13 

0.15 

0.13 

0.02 

Outcomes 






Assessment at Kindergarten entry 






WJ Letter-Word raw score - Cohort 1 

10.51 

7.98 

10.35 

8.08 



(4.06) 

(4.06) 

(4.14) 

(4.01) 


WJ Applied Problems raw score - Cohort 1 

13.15 

12.95 

13.03 

12.62 



(3.97) 

(3.94) 

(3.90) 

(4.08) 


WJ Spelling raw score - Cohort 1 

9.06 

8.53 

9.05 

8.46 



(2.90) 

(2.41) 

(2.96) 

(2.40) 


Assessment at Age 4 program entry 






WJ Letter-Word raw score - Cohort 2 

4.55 

4.81 

4.53 

4.82 



(3.14) 

(3.14) 

(3.12) 

(4.03) 


WJ Applied Problems raw score - Cohort 2 

8.39 

8.00 

8.42 

7.86 



(4.76) 

(4.66) 

(4.55) 

(4.63) 


WJ Spelling raw score - Cohort 2 

4.25 

5.04 

4.54 

4.86 



(2.14) 

(3.03) 

(2.55) 

(3.10) 


Observations 

211 

329 

211 

329 



Notes: HS-Head Start. Sample restricted to children who are free and reduced-price lunch eligible. Cohort 1 refers to the 
group of children who participated in OK pre-k or Head Start during the 2005-06 school year and are entering kindergarten 
at the time of the assessment, the start of the 2006-07 school year. Cohort 2 refers to the group of children who are entering 
OK pre-k or Head Start in the 2006-07 school year. Diff refers to differences in proportions or means, where * p<.05. 



Table 2. 

Propensity Score weighted Regression Discontinuity results for the effects of 
Age 3 Head Start + Age 4 OK Pre-k vs. Age 3 Head Start + Age 4 Head Start 



Letter-Word 

B (se) d 

Applied 

Problems 

B(se) d 

Spelling 

B(se) d 

Age 4 OK Pre-k & Age 3 HS effect 

3.77*** 

0.69 

2 y y*** 

(cutoff) 

(1.03) 

(1.10) 

(0.73) 

Effect size 

0.92 

0.14 

0.68 

Age 4 HS & Age 3 HS effect 

1.88* 

1.36 

1.72** 

(Pathway 'cutoff + cutoff) 

(0.98) 

(1.06) 

(0.72) 

Effect size 

0.46 

0.27 

0.53 

Age 4 HS & Age 3 HS differential effect 

-1.89** 

0.66 

-0.45 

(Pathway 'cutoff) 

(0.88) 

(1.09) 

(0.69) 

Effect size and direction of difference 

-0.46 

+0.13 

-0.14 

p-value of difference 

0.02 

0.54 

0.51 

Covariates 

Age 4 HS & Age 3 HS 

0.23 

-0.40 

0.35 


(0.53) 

(0.76) 

(0.45) 

Age as distance from treatment cutoff 

0.0048* 

0.010*** 

0.0064*** 


(0.0029) 

(0.0029) 

(0.0020) 

Age squared 

0.0000059 

0.0000080 

0.00000063 


(0.0000097) 

(0.000010) 

(0.0000069) 

Female 

0.68* 

0.45 

0.82*** 


(0.39) 

(0.44) 

(0.29) 

Child had some non-parental care at age 3 

0.0014 

0.56 

0.40 


(0.57) 

(0.73) 

(0.39) 

Reduced-price lunch 

-0.47 

-1.86* 

-0.87 


(0.77) 

(0.93) 

(0.72) 

Maternal Education 

Below High School 

-0.67 

0.63 

-0.077 


(0.52) 

(0.60) 

(0.39) 

Some college 

0.71 

1.42** 

0.90** 


(0.57) 

(0.58) 

(0.45) 

College + 

1.38 

1.99** 

0.66 


(0.88) 

(0.98) 

(0.57) 

Child race 

Black 

1.28** 

1.02 

0.92* 


(0.61) 

(0.81) 

(0.47) 

Hispanic 

0.41 

0.74 

1.94*** 


(0.66) 

(0.82) 

(0.58) 

Asian/Native/Other 

0.53 

2.10** 

0.44 


(0.78) 

(0.97) 

(0.59) 

Missing parent education 

-0.22 

-0.46 

0.17 


(0.68) 

(0.73) 

(0.53) 

Missing non-parental care 

1.19 

1.45** 

0.60 


(0.62) 

(0.66) 

(0.42) 

Constant 

3.60*** 

7.70*** 

3.45*** 


(0.82) 

(1.10) 

(0.69) 

Observations 

407 

404 

391 


“'significant at .01 level, ** significant at .05 level, * significant at .10 level. Reference group for effect of exposure is age 4 OK 
Pre-k + age 3 Head Start. Observations that fall within the 270 day bandwidth from the treatment cutoff are included (Age- 
birthdate cutoff <= 270 in absolute value). Outcome variable is a raw score. All models use clustered SEs by teacher. 
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Figure 1: Histogram and McCrary Density plot of age by treatment status 
A) Histogram of age by treatment status and preschool pathway 
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B) McCrary Density plot of age 



Caption for Figure 1 
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A) X-axis indicates children’s age in years on September 1 st , 2005 (i.e., the start of the 2005-2006 
school year); bars represent the percent of the sample for each age. These four histograms illustrate that 
children’s treatment status is a function of their ages, which is discontinuous at four years. 

B) X-axis indicates children’s age in years on September 1 st , 2005. The graph shows the McCrary 
(2008) test for a discontinuity in the density of children near the birthdate cutoff for both pathways 
combined. Test results confirm no differences in the density of children near the cutoff (Theta=0.10, t- 
statistic=0.88, p-value=0.19). 
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Figure 2: Histograms of the assignment variable and selected covariates within a 90-day bandwidth of 
the treatment cutoff 


a) Age of study sample relative to the treatment cutoff 
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b) Proportion of study sample Hispanic relative to the treatment cutoff 
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c) Proportion of study sample reduced-price lunch eligible relative to the treatment cutoff 


45 




















c 

3 


Q_ CD - 

i ' 

o 

3 

■Q 

0) 

QC 

c 

o 

tr 

o 

Q. 

O 

rT C\J 


Comparison 


Treatment 


-100 


-50 0 50 

Age as distance from the treatment cutoff 


100 


d) Proportion of study sample parents with High School degree or higher 
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Captions for Figure 2 

A) Y-axis indicates the percent of children within a birthdate range around the treatment cutoff in the 
study sample. This figure shows that the distributions of children’s ages are similar on both sides of the 
cutoff (i.e., no clustering at the cutoff). Children in cohort 1 are shown on the right-hand side of the 
figure (treatment), and children in cohort 2 are shown on the left-hand side (comparison). 
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B-D) Y-axes indicates the percent of children within a birthdate range around the treatment cutoff with 
the identified characteristic (Hispanic, reduced-price lunch eligible and High School degree or higher, 
respectively) near the cutoff. These figures illustrate that the distributions of children’s observable 
characteristics are similar on both sides of the cutoff. Children in cohort 1 are shown on the right-hand 
side of the figures (treatment), and children in cohort 2 are shown on the left-hand side (comparison). 
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Figure 3: Histogram of Propensity Scores to assess common support between age 4 treatment states 
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Caption for Figure 3 

Bar height indicates the proportion of children at each value of the propensity score value for the age 4 
OK pre-k and age 4 Head Start groups to assess common support. These overlay histograms show that 
there is adequate overlap in propensity scores, meaning that individuals in both preschool pathways 
were comparable with respect to their propensity for treatment. 


48 

















Caption for Figure 4 

Bars represent preschool exposure effect sizes for each outcome. Brackets indicate the significance of 
the difference in effect sizes between the two preschool pathways. 
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